Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
نویسندگان
چکیده
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges’ occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase, we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.
منابع مشابه
Efficient Matching and Indexing of Graph Models in Content-Based Retrieval
ÐIn retrieval from image databases, evaluation of similarity, based both on the appearance of spatial entities and on their mutual relationships, depends on content representation based on Attributed Relational Graphs. This kind of modeling entails complex matching and indexing, which presently prevents its usage within comprehensive applications. In this paper, we provide a graphtheoretical fo...
متن کاملAn Efficient Probabilistic Approach for Graph Similarity Search
Graph similarity search is a common and fundamental operation in graph databases. One of the most popular graph similarity measures is the Graph Edit Distance (GED) mainly because of its broad applicability and high interpretability. Despite its prevalence, exact GED computation is proved to be NP-hard, which could result in unsatisfactory computational efficiency on large graphs. However, exac...
متن کاملComparative Survey of Query Processing on Graph Databases
Graph Databases are rapidly increasing in popularity, size and application. Currently, graph query processing involves some form of isomorphism test, which results in very high response times. Indexing is the most popular way to optimize query processing times. In this paper, we compare some of the existing work on subgraph query processing including cIndex, gIndex and FG-Index. There is a prec...
متن کاملSubgraph Isomorphism Search in Massive Graph Databases
Subgraph isomorphism search is a basic task in querying graph data. It consists to find all embeddings of a query graph in a data graph. It is encountered in many real world applications that require the management of structural data such as bioinformatics and chemistry. However, Subgraph isomorphism search is an NPcomplete problem which is prohibitively expensive in both memory and time in mas...
متن کاملSIMCOMP/SUBCOMP: chemical structure search servers for network analyses
One of the greatest challenges in bioinformatics is to shed light on the relationship between genomic and chemical significances of metabolic pathways. Here, we demonstrate two types of chemical structure search servers: SIMCOMP (http://www.genome.jp/tools/simcomp/) for the chemical similarity search and SUBCOMP (http://www.genome.jp/tools/subcomp/) for the chemical substructure search, where b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 5 شماره
صفحات -
تاریخ انتشار 2012